AITopics | perspective transformation

Collaborating Authors

perspective transformation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee

Neural Information Processing SystemsApr-22-2026, 11:27:08 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, category, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

Pic2Diagnosis: A Method for Diagnosis of Cardiovascular Diseases from the Printed ECG Pictures

Büyüksolak, Oğuzhan, Öksüz, İlkay

arXiv.org Artificial IntelligenceDec-9-2025

The electrocardiogram (ECG) is a vital tool for diagnosing heart diseases. However, many disease patterns are derived from outdated datasets and traditional stepwise algorithms with limited accuracy. This study presents a method for direct cardiovascular disease (CVD) diagnosis from ECG images, eliminating the need for digitization. The proposed approach utilizes a two-step curriculum learning framework, beginning with the pre-training of a classification model on segmentation masks, followed by fine-tuning on grayscale, inverted ECG images. Robustness is further enhanced through an ensemble of three models with averaged outputs, achieving an AUC of 0.9534 and an F1 score of 0.7801 on the BHF ECG Challenge dataset, outperforming individual models. By effectively handling real-world artifacts and simplifying the diagnostic process, this method offers a reliable solution for automated CVD diagnosis, particularly in resource-limited settings where printed or scanned ECG images are commonly used. Such an automated procedure enables rapid and accurate diagnosis, which is critical for timely intervention in CVD cases that often demand urgent care.

artificial intelligence, ecg image, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/EMBC58623.2025.11254054

2507.19961

Country:

Asia > Middle East > Republic of Türkiye (0.29)
Europe (0.29)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders

Dakic, Kosta, Thilakarathna, Kanchana, Calheiros, Rodrigo N., Lim, Teng Joon

arXiv.org Artificial IntelligenceOct-7-2024

Multiview systems have become a key technology in modern computer vision, offering advanced capabilities in scene understanding and analysis. However, these systems face critical challenges in bandwidth limitations and computational constraints, particularly for resource-limited camera nodes like drones. This paper presents a novel approach for communication-efficient distributed multiview detection and tracking using masked autoencoders (MAEs). We introduce a semantic-guided masking strategy that leverages pre-trained segmentation models and a tunable power function to prioritize informative image regions. This approach, combined with an MAE, reduces communication overhead while preserving essential visual information. We evaluate our method on both virtual and real-world multiview datasets, demonstrating comparable performance in terms of detection and tracking performance metrics compared to state-of-the-art techniques, even at high masking ratios. Our selective masking algorithm outperforms random masking, maintaining higher accuracy and precision as the masking ratio increases. Furthermore, our approach achieves a significant reduction in transmission data volume compared to baseline methods, thereby balancing multiview tracking performance with communication efficiency.

camera node, dataset, detection, (15 more...)

arXiv.org Artificial Intelligence

2410.04817

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Neural Information Processing SystemsMar-12-2024, 18:58:35 GMT

Understanding the 3D world is a fundamental problem in computer vision. However, learning a good representation of 3D objects is still an open problem due to the high dimensionality of the data and many factors of variation involved. In this work, we investigate the task of single-view 3D object reconstruction from a learning agent's perspective. We formulate the learning process as an interaction between 3D and 2D representations and propose an encoder-decoder network with a novel projection loss defined by the perspective transformation. More importantly, the projection loss enables the unsupervised learning using 2D observation without explicit 3D supervision. We demonstrate the ability of the model in generating 3D volume from a single 2D image with three sets of experiments: (1) learning from single-class objects; (2) learning from multi-class objects and (3) testing on novel object classes. Results show superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.

category, supervision, transformation, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Katiyar, Sulabh, Borgohain, Samir Kumar

arXiv.org Artificial IntelligenceFeb-22-2021

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional Neural Network as encoder to extract features from images, Hierarchical Context based Word Embeddings for word representations and a Deep Stacked Long Short Term Memory network as decoder, in addition to using Image Data Augmentation to avoid over-fitting. For data Augmentation, we use Horizontal and Vertical Flipping in addition to Perspective Transformations on the images. We evaluate our proposed methods with two image captioning frameworks- Encoder-Decoder and Soft Attention. Evaluation on widely used metrics have shown that our approach leads to considerable improvement in model performance.

image captioning, lstm, representation, (17 more...)

arXiv.org Artificial Intelligence

2102.11237

Country: Asia > India (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Set and Data Augmentation for Face Detection and Recognition

#artificialintelligenceJan-14-2020, 17:39:51 GMT

When it comes to building an Artificially Intelligent (AI) application, your approach must be data first, not application first. Dependencies on data cost more than software dependencies, but are constantly overlooked. To build a face detection and/or face recognition model it's important to know available data set and data augmentation approaches to be followed for training the model. Note that there's difference between Face Identification and Face Recognition. It is process of comparing face image with claimed identity, basically it is a "One-to-one matching".

data augmentation, recognition, transformation, (13 more...)

#artificialintelligence

AI-Alerts: 2020 > 2020-01 > AAAI AI-Alert for Jan 22, 2020 (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)

Add feedback

Lane Detection with Deep Learning (Part 2) – Towards Data Science

@machinelearnbotJan-16-2018, 22:47:51 GMT

This is part two of my deep learning solution for lane detection, which covers the actual models I created in finding my final approach to the problem, as well as some potential improvements. Be sure to read Part One for the limitations of my previous approaches as well as the preliminary data used prior to the changes I made below. The code and data mentioned here and in the earlier post can be found in my Github repo. With a decent dataset created, I was ready to make my first model for using deep learning to detect lane lines. You may be asking, "Wait, I thought you were trying to get rid of perspective transformation?"

neural network, perspective transformation, road image, (13 more...)

@machinelearnbot

Industry: Education (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Yan, Xinchen, Yang, Jimei, Yumer, Ersin, Guo, Yijie, Lee, Honglak

Neural Information Processing SystemsDec-31-2016

Understanding the 3D world is a fundamental problem in computer vision. However, learninga good representation of 3D objects is still an open problem due to the high dimensionality of the data and many factors of variation involved. In this work, we investigate the task of single-view 3D object reconstruction from a learning agent's perspective. We formulate the learning process as an interaction between 3D and 2D representations and propose an encoder-decoder network with a novel projection loss defined by the perspective transformation. More importantly, the projection loss enables the unsupervised learning using 2D observation without explicit 3D supervision. We demonstrate the ability of the model in generating 3D volume from a single 2D image with three sets of experiments: (1) learning from single-class objects; (2) learning from multi-class objects and (3) testing on novel object classes. Results show superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.

artificial intelligence, category, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback